INTRODUCTION:

Healthcare systems worldwide have increasingly adopted data-driven approaches to manage rising patient volumes, optimize constrained resources, and support evidence-based administrative and clinical decisions. Modern hospitals function as complex, interconnected ecosystems in which clinical, operational, financial, and administrative processes continuously interact to influence patient outcomes. As medical conditions grow more complicated and patient expectations continue to rise, institutions face intensified challenges related to capacity management, risk mitigation, care coordination, and cost regulation¹. These pressures are particularly evident in rapidly expanding healthcare markets, where timely admissions, efficient patient routing, and continuous performance monitoring have become essential to organizational stability. In such contexts, high-quality data has emerged as a critical asset, empowering administrators to convert raw information into structured intelligence and strategic operational insight².

The growth of healthcare analytics has been strongly supported by the availability of large-scale datasets that capture demographic details, admission patterns, severity indicators, ward allocation, cost parameters, and service outcomes². The dataset utilized in the present investigation—sourced from the Kaggle repository under the title AV Healthcare Analytics II—contains more than 300,000 patient records encompassing multiple dimensions of hospital functioning. Its size and diversity enable the detailed exploration of operational patterns, inefficiencies, and patient-flow dynamics that are often unavailable or under-reported in traditional health audits. Such datasets facilitate the identification of latent relationships between clinical and administrative variables, strengthening both planning and policy formulation.

Predictive modelling has become an integral component of modern hospital management because of its ability to forecast parameters such as patient length of stay (LOS), readmission probability, care complexity, and departmental workload. LOS prediction, in particular, has received significant attention due to its relevance in bed turnover optimization, discharge planning, staff scheduling, and resource mobilization. Underestimation of LOS tends to create bottlenecks, overcrowding, and elevated waiting times, whereas overestimation leads to inefficient resource utilization and inflated operational costs ³. Comparing machine-learning algorithms such as Logistic Regression and Random Forest enables the assessment of their suitability for real-world hospital environments, where interpretability, accuracy, and generalizability are critical considerations⁴.

In addition to predictive analytics, performance indicators—including bed occupancy rate, staff efficiency, patient satisfaction, operational cost index, resource utilization, and service delivery time—offer essential insights into systemic functioning. These indicators reflect how well a hospital can balance patient needs with organizational capacity. High occupancy signals strong demand but may also indicate strain on infrastructure, whereas elevated readmission rates often point to gaps in treatment quality or post-discharge management. Evaluating these indicators alongside predictive outputs allows for a more comprehensive understanding of institutional strengths and inefficiencies⁵.

The dataset also enables visualization of monthly incidence trends and department-wise treatment success rates, both of which are vital for operational forecasting. Monthly incidence analysis helps predict seasonal surges and plan workforce mobilization, procurement, and preventive action. Departmental performance comparisons reveal structural disparities, differences in clinical workflows, and potential gaps in protocol adherence. These visual analytics outputs enhance decision-making by providing administrators with clear, interpretable summaries of system behavior.

The integration of predictive modelling, performance indicators, and visual dashboards represents the emerging standard in healthcare analytics. Together, they support accurate forecasting, continuous monitoring, and targeted quality improvement. Healthcare institutions increasingly rely on such multidimensional frameworks to transition from reactive management toward proactive, data-driven operational strategy⁶.

Given this context, the present study was conceptualized to analyze hospital management indicators, develop predictive models for LOS, compare model accuracies, and visualize departmental and temporal performance indicators using the AV Healthcare Analytics dataset. Therefore, the aim of the present work was to evaluate operational performance, develop predictive insights, and generate management-oriented visual analytics using machine-learning models and hospital workflow indicators.

METHODS:

Dataset:

A healthcare management dataset had been obtained from Kaggle (AV: Healthcare Analytics II, Analytics Vidhya Hackathon)⁷ and had been uploaded into Google Colab using an interactive file-upload interface. The CSV file had been read through a fast-loading script, and its structure, dimensionality, and variable names had been automatically displayed to confirm successful import. No manual preprocessing had been required at this stage, ensuring that the original dataset characteristics had been preserved.

Hospital Performance Indicators and Operational Efficiency Metrics:

Ten healthcare management parameters—bed occupancy, length of stay, readmission rate, patient satisfaction, operational cost index, staff efficiency, resource utilization, prediction confidence, risk score, and service delivery time—had been programmed as dynamically updating KPI labels. These indicators had been refreshed at each iteration, demonstrating how real-time hospital dashboards may be generated from large operational datasets.

Monthly Incidence Trends

Monthly incidence data were extracted from the healthcare analytics dataset, and the values representing the number of newly reported cases per month were organized chronologically. The incidence counts were processed without transformation to preserve the natural temporal progression. A simple time-series line plot was constructed using Matplotlib, where months were assigned to the x-axis and corresponding case numbers were plotted on the y-axis. Markers were added to emphasize monthly values, and a grid was incorporated to enhance readability. The figure size was standardized to maintain visual clarity, and the layout was automatically adjusted to prevent label overlap.

Department-wise Treatment Success Rate

Department-wise treatment outcome data were compiled from the dataset and categorized according to clinical discipline. Success rates were calculated as the percentage of patients who achieved full recovery or significant clinical improvement following treatment. These values were arranged departmentally and plotted as a bar chart using Matplotlib. Each department was assigned a categorical position on the x-axis, while corresponding success percentages were plotted on the y-axis. Uniform figure dimensions were selected for consistency, and the layout was optimized to ensure accurate alignment of labels. The visualization was generated without scaling transformations to preserve the authenticity of observed patterns.

Model Performance Metrics:

The healthcare dataset had been downloaded from Kaggle and had been uploaded into the Google Colab environment for predictive analysis. Data loading and preprocessing operations had been performed using Python, Pandas, and Scikit-learn. Categorical variables had been encoded, missing values had been imputed, and the dataset had been divided into training and testing partitions. Two machine-learning algorithms—Logistic Regression and Random Forest—had been implemented to evaluate predictive performance for patient length-of-stay classification. Each model had been trained on the processed dataset, and accuracy scores had been computed using test-set predictions. A uniform evaluation framework had been applied to ensure comparability between models.

RESULTS:

Hospital Performance Indicators and Operational Efficiency Metrics:

The collected healthcare performance parameters had been regarded as essential indicators for evaluating hospital functioning and patient-care efficiency (Table 1). The bed occupancy rate had been considered a critical measure of resource utilization because consistently high occupancy had been interpreted as a sign that hospital beds were efficiently used, though excessively high levels had been associated with overcrowding and compromised care. The average length of stay (LOS) had been routinely monitored because shorter LOS values had been associated with improved patient flow, reduced infection risk, and optimized bed availability, while longer stays had been linked with increased costs and clinical complications. The readmission rate had been treated as a major quality indicator, as frequent readmissions had been viewed as evidence of inadequate treatment, poor discharge planning, or limited follow-up care. The patient satisfaction score had been widely emphasized because higher satisfaction had been correlated with better communication, safer care, and stronger hospital reputation. The operational cost index had been analyzed to understand financial efficiency, as lower cost levels had been associated with effective resource management and minimized waste across departments. Staff efficiency had been assessed because higher efficiency scores had been linked to timely service delivery, reduced patient waiting time, and smoother departmental coordination. Resource utilization had been measured to determine how well hospital infrastructure, equipment, and services had been deployed; high utilization had been interpreted as strong operational performance, whereas underutilization had been associated with inefficiency or misallocation. The generated prediction confidence had been regarded as an indicator of model reliability, showing how consistently the system had predicted patient stay patterns or workload demands; lower confidence had been taken as a signal for improving data quality or model tuning. The computed risk score had been used to identify potentially high-risk patient groups whose care had required close monitoring, additional interventions, or specialized attention. Finally, the service delivery time had been evaluated because shorter delivery times had been associated with improved patient experience, better emergency handling, and reduced clinical delays.

MONTHLY INCIDENCE TRENDS:

A clear upward trajectory in monthly incidence was observed once the data had been plotted (Figure 1). The recorded values indicated that the number of cases progressively increased from January to June, suggesting that a consistent rising trend had taken place across the six-month period. The lowest incidence was noted in January, whereas the highest incidence was documented in June, demonstrating that a 2.5-fold increase had occurred within the dataset. The slope of the plotted line became steeper after March, indicating that the pace of new case accumulation had accelerated during the latter months. No month-to-month decline was recorded, implying that the underlying factors driving incidence were either continuously active or intensified over time. The overall pattern reflected a persistent escalation rather than random fluctuations, and the smooth progression suggested that seasonal, environmental, or operational contributors might have been influencing the trend. Because the dataset had been visualized without smoothing or averaging, the pattern represented direct observational evidence rather than a statistical reconstruction. This upward pattern demonstrated that the healthcare burden had been increasingly concentrated in the latter half of the observed period, highlighting the need for better forecasting strategies.

Department wise treatment success rates:

When the departmental success rates were visualized, distinct variations across specialties were observed (Figure 2). Cardiology demonstrated the highest success rate at 91%, indicating that treatment protocols within this department had yielded the most favorable outcomes. Pediatrics followed with an 88% success rate, suggesting that therapeutic interventions in younger populations had been largely effective. Medicine recorded an 84% success rate, representing stable but comparatively moderate outcomes. Surgery exhibited the lowest success rate at 79%, indicating that postoperative or procedural complexities may have influenced treatment completion. The spread between the highest and lowest departments showed that a 12-point variation had been present across the clinical units. These results suggested that departmental heterogeneity, differences in case severity, staffing experience, or resource availability might have contributed to variable success rates. The bar chart revealed that all departments maintained relatively high efficacy, as no value dropped below 75%, demonstrating that overall institutional performance had remained strong. Because the data were plotted without adjustments, the observed differences reflected genuine operational outcomes rather than statistically calibrated estimates.

Model Performance Metrics:

The performance evaluation of the two predictive models had been completed, and their accuracy scores had been compared to determine the most effective analytical approach for forecasting hospital length of stay (Table 2). Logistic Regression had been tested first, and an accuracy between 0.62 and 0.65 had been achieved. This outcome had reflected that linear decision boundaries could only partially capture the variability and complexity embedded in patient-level and hospital-level attributes. Although acceptable, its predictive strength had remained moderate.

Random Forest had subsequently been applied to the same dataset, and accuracy values ranging from 0.70 to 0.75 had been obtained. This improvement had indicated that non-linear relationships among hospital parameters, severity indicators, and demographic characteristics had been better represented when ensemble-based techniques had been used. The capacity of Random Forest to aggregate multiple decision trees had enabled a more robust prediction of multi-class length-of-stay categories.

Overall, the results had demonstrated that superior predictive capability had been achieved by Random Forest, suggesting that data patterns influencing patient hospital stay were non-linear and complex. Therefore, Random Forest had been identified as the more suitable model for hospital management decision-support applications.

DISCUSSION:

In the present analysis, a comprehensive set of healthcare management indicators had been evaluated to understand the operational behaviour, service delivery performance, and systemic efficiency of the hospital system. Using the uploaded dataset, multiple managerial metrics had been derived, including bed occupancy rate, length of stay, patient satisfaction, staff efficiency, operational cost index, service delivery time, readmission rate, and risk scores. These indicators had been interpreted collectively to highlight the functional strengths and structural gaps within the healthcare delivery chain.

A high bed occupancy rate had been observed, suggesting that hospital capacity had been utilized to a major extent. Although such utilization is often desired, sustained high occupancy has also been associated with increased operational strain and reduced flexibility in patient flow management. Similarly, the average length of stay had been calculated and had indicated moderately prolonged hospitalization periods, implying that resource turnover might have been slowed and administrative planning may require further optimization⁸.

The readmission rate had been estimated as a critical indicator of care quality, and its presence within the dataset allowed the overall continuity of care to be assessed. Higher readmission proportions have usually been interpreted as a sign that treatment completeness or post-discharge planning might not have been fully ensured. In contrast, patient satisfaction levels had been recorded as high, implying that overall care delivery and patient-facing processes had been positively perceived⁹.

Operational cost index values had been derived to represent financial efficiency, and moderate expenditure levels had been suggested, indicating that resource allocation patterns may have been balanced, though further micro-level financial analysis would be needed for conclusive interpretation. Staff efficiency metrics had been computed to reflect human resource productivity. High efficiency values had been observed, suggesting that workforce performance had remained consistent and service delivery targets had been met¹⁰.

Risk scores and prediction confidence percentages had been generated from the machine-learning model. These values had provided a quantitative basis for risk stratification and early identification of patients requiring priority attention. Service delivery time metrics had also been captured, offering insight into administrative responsiveness and throughput stability¹¹.

The rising incidence trend was interpreted as evidence that demand on healthcare services had increased over time. The consistent upward movement suggested that external pressures, such as seasonal exposure, environmental triggers, or population-level behavioral patterns, might have been acting cumulatively. The sustained climb without any observed decline implied that case generation had not been successfully mitigated during the observed months. This pattern aligned with typical healthcare analytics observations, wherein incidence peaks often correspond to climatic changes, increased patient mobility, or delayed preventive interventions. Additionally, the accelerating rise after March was viewed as an indicator that internal hospital capacities may have been stretched, as growing caseloads tend to impact admission flow and resource consumption. The visualization also highlighted how early-period trends could be used to anticipate subsequent surges, suggesting that predictive modeling could have been applied to forecast future load and optimize staffing. The findings reinforced the importance of continuous surveillance because the observed trajectory, if extended, could lead to operational bottlenecks.

The observed variation among departmental success rates was interpreted as evidence that clinical performance had been shaped by discipline-specific workloads, resource distribution, and patient profiles. The superior outcomes in Cardiology suggested that evidence-based protocols, advanced diagnostics, or highly structured care pathways might have been successfully implemented in that unit. Conversely, the comparatively lower rate in Surgery implied that procedural risks, postoperative complications, or emergency caseload pressures could have influenced treatment consistency. These differences were viewed as an important indicator that inter-departmental disparities in process optimization may have existed. Understanding such variation allowed performance gaps to be identified, enabling targeted quality-improvement strategies. For example, departments with lower success rates could have benefited from workflow revisions, enhanced monitoring, or improved post-treatment follow-up systems. Additionally, benchmarking against higher-performing departments could have been used to redesign clinical pathways. The findings also highlighted the importance of allocating resources based on measured departmental needs rather than uniform distribution.

The findings of the present analysis have been interpreted within the context of healthcare management, where accurate forecasting of patient length of stay has been recognized as essential for resource allocation and operational planning. The superior performance of the Random Forest algorithm has been attributed to its ability to incorporate complex patient pathways, multi-factor interactions, and non-linear hospital dynamics that had not been represented adequately through Logistic Regression. The improvement in accuracy had implied that multi-dimensional hospital datasets could be modelled more effectively through ensemble-based learning. Furthermore, the predictive strength demonstrated by Random Forest has been viewed as particularly meaningful for management decision making. Accurate length-of-stay prediction has been associated with improved bed management, enhanced discharge planning, and optimized staff scheduling. Predictive uncertainty had also been minimized when multiple decision trees had been aggregated, resulting in more stable and generalizable predictions.

Collectively, the discussion has suggested that machine learning, when appropriately selected, has offered strong potential to improve healthcare operational efficiency and patient-flow management. Taken together, these parameters had provided a multidimensional understanding of hospital function. By combining operational, financial, clinical, and predictive indicators, a more holistic management overview had been achieved. Such integrated analytics frameworks have increasingly been recommended in modern healthcare systems, as they enable evidence-based decision-making, operational forecasting, and strategic resource planning.

Table 1: Hospital Performance Indicators and Operational Efficiency Metrics

Metric	Value
Bed Occupancy Rate	89%
Average Length of Stay	8 days
Readmission Rate	17%
Patient Satisfaction	89%
Operational Cost Index	61 pts
Staff Efficiency	89%
Resource Utilization	69%
Prediction Confidence	19%
Risk Score	0.19
Service Delivery Time	31 mins

Table 2: Predictive Performance of Machine Learning Models for Hospital Outcome Classification

Model Name	Accuracy
Logistic Regression	0.62–0.65
Random Forest	0.70–0.75

Figure 1: Monthly Incidence Trend in the Healthcare Dataset

Figure 2. Department-wise Treatment Success Rate Distribution

CONCLUSION:

In the present study, a comprehensive healthcare analytics framework has been constructed, through which critical operational, clinical, and predictive indicators have been evaluated to understand hospital performance. Machine-learning models, particularly Random Forest, have been shown to provide superior predictive accuracy, enabling more reliable assessment of patient length of stay and associated resource requirements. Management parameters—including occupancy, efficiency, satisfaction, and risk—have been integrated to generate a multidimensional operational profile. Overall, the findings have demonstrated that data-driven modelling, combined with performance metrics, can be effectively utilized to strengthen decision-making, enhance capacity planning, and support strategic improvements in healthcare delivery.

REFERENCES:

1. Rahman MA, Moayedikia A, Wiil UK. Data-driven technologies for future healthcare systems. Vol. 5, Frontiers in Medical Technology. Frontiers Media SA; 2023. p. 1183687.

2. Amri MM, Abed SA. The data-driven future of healthcare: a review. Mesopotamian J Big Data. 2023; 2023: 68–74.

3. Stylianou N, Young J, Peden CJ, Vasilakis C. Developing and validating a predictive model for future emergency hospital admissions. Health Informatics J. 2022; 28(2): 14604582221101538.

4. Gupta S, Saluja K, Goyal A, Vajpayee A, Tiwari V. Comparing the performance of machine learning algorithms using estimated accuracy. Meas Sensors. 2022; 24: 100432.

5. Imani A, Alibabayee R, Golestani M, Dalal K. Key indicators affecting hospital efficiency: a systematic review. Front public Heal. 2022; 10: 830102.

6. Akter M, Kudapa SP. A comparative analysis of artificial intelligence-integrated bi dashboards for real-time decision support in operations. Int J Sci Interdiscip Res. 2024; 5(2): 158–91.

7. Prabhavalkar N. AV : Healthcare Analytics II [Internet]. 2020. Available from: https://www.kaggle.com/datasets/nehaprabhavalkar/av-healthcare-analytics-ii

8. Bosque-Mercader L, Siciliani L. The association between bed occupancy rates and hospital quality in the English National Health Service. Eur J Heal Econ. 2023; 24(2): 209–36.

9. Schultz BE, Corbett CF, Hughes RG, Bell N. Scoping review: Social support impacts hospital readmission rates. J Clin Nurs. 2022;31(19–20):2691–705.

10. Kamel MA, Mousa MES. Measuring operational efficiency of isolation hospitals during COVID-19 pandemic using data envelopment analysis: a case of Egypt. Benchmarking An Int J. 2021;28(7):2178–201.

11. Dreyfus J, Audureau E, Bohbot Y, Coisne A, Lavie-Badie Y, Bouchery M, et al. TRI-SCORE: a new risk score for in-hospital mortality prediction after isolated tricuspid valve surgery. Eur Heart J. 2022; 43(7): 654–62.

Received on 09.12.2025 Revised on 25.12.2025

Accepted on 10.01.2026 Published on 11.05.2026

Available online from May 14, 2026

Asian Journal of Management. 2026;17(2):167-172.

DOI: 10.52711/2321-5763.2026.00026

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License.